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The cost of dynamical quark simulations with improved staggered quarks is estimated based on current and 
planned running by the MILC collaboration. I find that a few 10s of Teraflop years should be sufficient to calculate 
down to a lattice spacing of 0.045 fm. 


1. INTRODUCTION 

Several caveats should be pointed out to the 
reader. First, I think that the past 20 years of my 
life are evidence of my inability to estimate the 
time and effort necessary to calculate the spec¬ 
trum of QCD. Used to doing analytic calculations 
that took a few months at most, I did not imagine 
as I started my first Monte Carlo calculation that 
20 years later I would still be doing similar cal¬ 
culations. I was a postdoc at Fermilab then, and 
as I write this I am back at Fermilab on sabbat¬ 
ical. With great pleasure I see how far we have 
come, and look forward to an exciting future of 
improved calculations. 

Caveat two is that this write-up is not a tran¬ 
script of what I said in Berlin. Caveat three is 
that this does not represent a MILC consensus 
statement. I did my best to extract from past ex¬ 
perience what is required for future calculations, 
but the whole collaboration has not had an op¬ 
portunity to check or react. 

2. TIME ESTIMATE 

Since the CG routine is no longer so dominant, 
the formula for counting operations is not quite so 
simple. For a 2+1 flavor run, let: r = # of time 
units per independent configuration; Of = # of 
operations for fermion force per site; Ofl = of 
operations for fat link calculation per site; Ocg = 
# of operations per CG iteration per site; N GG = 
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# of iteration to solve CG for s-quark. Denoting 
the quark masses m s and mi, we use a time step 
At = 2/3 mi, and we expect N l CG = N GG -m s /mi. 
The operation count is 

= I + 0 FL + 0 C gN gg ) 

+O CG N s cg %] (1) 

2.1. Production running 

MILC, already in production with three dy¬ 
namical quarks, has completed runs with lattice 
spacing 0.2 and 0.13 fm [l|]. The Asqtad action 
has leading errors of order a 2 g 2 and uses tad¬ 
pole improved coefficients in the action Q. A 
series of runs was done to allow a smooth in¬ 
terpolation between the quenched approximation 
and the 2+1 flavor world. The coupling is tuned 
as the quark masses are reduced to fix a length 
scale determined from the heavy quark potential 
||. We have eight dynamical runs. In four, there 
are three degenerate quarks with mass 8 m s , 4 m s , 
2m s or m s . In four runs, the dynamical strange 
quark is fixed at m s and the light quark mass is 
0.8, 0.6, 0.4 or 0.2 times m s . 

Runs are « 2000-3000 molecular dynamical 
time units. One significant issue is how the auto¬ 
correlation time scales as the lattice spacing is re¬ 
duced. We cannot yet provide numerical evidence 
for this scaling law. Currently, we are saving ev¬ 
ery sixth trajectory. We see some autocorrelation 
between successive lattices. For analysis, we bin 
in groups of four, i.e., 24 time units. 

Also important is how many independent con¬ 
figurations are needed to achieve the desired accu- 
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racy. This number clearly depends on the quark 
masses and the quantity under study, but we (the 
lattice community) may not be aware of its de¬ 
pendence on such quantities as the action and 
volume. As an example of the latter, MILC has 
done some extensive tests of finite size effects in 
the past. It is much easier to get accurate masses 
on large volumes than on the smaller ones. Thus, 
we need more configurations for the smaller vol¬ 
umes. In our projects, we often use big volumes 
compared to some groups using Wilson or Clover 
quarks. We may need fewer independent config¬ 
urations to achieve the same accuracy. (Not to 
mention no finite-size effects.) 

Table [I] has timing estimates for some current 
runs at 0.09 fm. We print timing for the conju¬ 
gate gradient routine, but not for other parts of 
the code. We run on several different machines, 
but this estimate is based on the assumption of 
a speed of 200 MF/CPU. To get the operation 
count we will assume the entire code is running 
at 200 MF, not just the CG (the only part reg¬ 
ularly timed). The table contains the number of 
node hours required to create a configuration we 
will store given the parameters we use for time 
step and residual. The lattice volume is 28 3 x 96. 
In the first line, the quarks are degenerate, so we 
need only one quark field and Nf = 3. (For Asq- 
tad, Of « 420K, Ofl ~ 51K, N C g = 1187 and 
Nq G = 236, giving an opcount within 30% of the 
table value.) 

For the other two runs, we need two quark 
fields, with Nf = 2 and 1. In the last run, we 
reduce the light quark mass by a factor of two 
compared to the one above it. Traditional scaling 
laws would predict a factor of 4 increase in com¬ 
putation from halving the time step and doubling 
the number of CG iterations; however, as the code 
is no longer dominated by the CG routine, the 
time only increases by 2.4. For the lightest mass 
run, we plan to store 400 configurations. This 
amounts to 0.145 TF-years. 

Now we attempt to address the issue of what 
it would take to do a calculation of the quality 
of the CPPACS quenched Wilson quark calcula¬ 
tion. CPPACS states that their smallest lattice 
spacing is 0.05 fm. However, they only have 150 
configurations there, and their error bars are sig¬ 
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Computational requirement of runs 

mu,d 

node-hr/conf 

flops (10 i4 /conf) 

m s 

900 

6.5 

0.4m s 

4096 

29.5 

0.2m s 

9780 

70.4 


nificantly larger than at the stronger couplings. 
I have always wondered how their continuum ex¬ 
trapolation would change eliminating either the 
smallest or largest a. Is the hardest part of the 
calculation important in reducing the final error? 
If we halve our current lattice spacing to get to 
0.045 fm, we will be somewhat closer to the con¬ 
tinuum limit than they were. Assuming we gen¬ 
erate 400 configurations, we can roughly estimate 
the time required by multiplying our current light 
mass run by a factor of 2 6 to 2 8 . The smaller 
factor assumes four powers from the volume, one 
from the time step and another from the CG iter¬ 
ations (which may be an overestimate as the CG 
no longer dominates). The larger factor allows 
for a doubling of the autocorrelation time, and 
doubles the time for additional runs at heavier 
masses. This yields an estimate of 10-40 TF-yr 
of running. CPPACS’s next to smallest a was 
0.064 fm. If we were to go to a lattice spacing of 
0.06 fm, the increase in time from our present run 
would be from 11 to 26 depending on how things 
scale. This would only require about 2-4 TF-yr. 

Despite my first caveat and the surprise of 
many in the audience, I believe these are reason¬ 
able estimates for the runs outlined. MILC has 
not had a dedicated computer, but we have been 
able to do significant calculations partly because 
staggered quarks do not require as much compu¬ 
tation as Wilson/clover. 
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